Content Grabber for VWR license holders


okay so we are

hang on I'm gonna stop recording

hi everyone and welcome to the sequentum

webinar we are showcasing today our

third generation content Grabber

Enterprise product

um in case you guys are not aware

um I'm going to tell you a little bit

about sequenta and our story

um we basically started in 2010 with our

first generation product Visual Web

Ripper our CTO started this out of

Australia and uh over the course of

supporting uh many hundreds of customers

he realized that he

um had a lot of work to do and he

re-architected and rebuilt from the

ground up the content Grabber product

which came out he released that in 2015.

Matt got the attention of uh

quantitative and systematic hedge fund

investors in New York City

um and our Founders were flown to New

York and roquant Ventures which is the

BC arm of a quantitative systematic

hedge fund

um invested in the company and since

then in 2018 we set up a headquarters in

New York City

and the center of excellence in gurgaon

and India where we do a lot of the agent

development and product development

and we've grown to 40 employees and in

August of this year we were very proud

to announce the release of our third

generation product which is content

Grabber Enterprise which I'm going to be

showing you today

so content Grabber Enterprise uh you

know it was originally grew out it has

three components it has the desktop the

integrated development environment

which is where you write and maintain

all of your agents it has the servers

which are your workhorses that execute

all your jobs and then it has an agent

control center in the center that allows

you to manage your agents your agent

versions The deployments the runs the

schedules everything centrally including

your proxies

and tickets and we have a portal a web

portal that allows you to see the

history of runs for all of your agents

including key details like


success criteria the number of page

loads the number of Errors the data

count per run

per date and any Open tickets associated

with those agents

all right so now I'm going to jump right

in and give you a

a demo of our

of the desktop and I'll show you a

little bit of the agent control center

as well

so the desktop as you may know is a

Windows desktop and in it we have a

custom built version of the Chromium

browser just to give you a sense of of

how it works I'm going to bring up a

website that I want to write an agent


and what I'm doing is I'm bringing it up

inside our custom built browser inside

the tool and now you can see as I as I

moused around the page it's highlighting

various elements that I might want to

click on so I'm actually going to click

on this administrative support category

when you see content Garber knows

automatically the types of things that

you're going to want to do with that


so this time I'm actually just going to

click on it

now you can see in the top left corner

here it's opened up a second tab it's

keeping track of the flow of your agent

as it goes through the site and loads

dependent pages that provide more detail

um so this is great so far it's

automatically created your workflow it's

automatically created a schema in the

background for you

and now I'm going to go and start adding

to that schema this is a list a very


structure in our field of web data

extraction so I'm going to use my mouse

to Mouse over and click on the title of

one of these list items then I'm going

to Mouse over another one holding down

the shift key and it's automatically

going to create a list item I'm just

going to do a quick scroll down and see

did it miss anything no it got the whole

list just like that okay I'm going to

add that command now what it's done is

it's created the list item so it's

automatically detected all of the items

in the list on that page and it's

created a click through link to go to

the detail page which I'm going to do


so again see it loaded another tab for

that page it's clearly delineating

between the different

um dependent pages that I'm loading in

my workflow I'm going to go here and I'm

going to get the title

you see how it's creating the schema


I'm going to go here and I'm actually

I actually want to just transform this

content I really just want to parse out

the job ID so I'm actually going to

generate a regular expression

automatically just to pull that job ID

out and you can if you know regular

Expressions you know that it's pulling

the text that comes after a colon in a


have to write that stuff from scratch

that's really big time saver for your


here I'm just going to get the general


you know the job description

I call it JD

that's a lot of content

and now that's all I'm going to get

right now I'm showing you that there's

the schema in the background there's

these different tabs that it's done I'm

going to go ahead and save it yes now

I'm going to run it in debug


so you can see when I load it up in

debug mode what it's doing is it's

actually loading the page and displaying

it to me in real time so I can see

exactly what the agent is doing

um so it's actually clicking on the

various elements that I care about and

it's capturing the data and it's going

down the list automatically

it's actually moving really slowly

it's it's doing that because it has to

render the whole page usually it would

move a lot faster

I'm going to stop this

um because I don't think we need to

stare at all of that but I'm going to


the internal data whoops

why does that always happen

oh I know why because I didn't specify


let's see

so let's get my data exported here I'm

actually specifying

the export Target

and now I'm going to go ahead back over

here and I'm going to view the export

Target no still getting an error

here we go

so these are the different pages this is

all the internal data that it would then

process and create into his your title

your J your job ID and your job


this is what it would would export then

um now if I wanted to

I should just stop right here shouldn't

I I don't want